Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print LmodError when loading GCCcore-12.2.0-based modules on zen4 #841

Merged
merged 8 commits into from
Jan 10, 2025

Conversation

casparvl
Copy link
Collaborator

@casparvl casparvl commented Dec 10, 2024

Implements the idea from https://gitlab.com/eessi/support/-/issues/37#note_2159031831

But, not currently working, because the first module that gets installed that uses GCCcore-12.2.0 as dependency will try to load it (even with --module-only), which then fails:

== creating module...
  >> generating module file @ /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
== ... (took < 1 sec)
== permissions [skipped]
== packaging [skipped]
  >> running command:
        [started at: 2024-12-10 18:12:47]
        [working dir: /gpfs/home4/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules]
        [output logged in /scratch-local/casparl.8987353/eb-bscq9hvh/easybuild-run_cmd-pxom1tke.log]
        bzip2 /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/12.2.0/easybuild/easybuild-GCCcore-12.2.0-20241210.181247.log
  >> command completed: exit 0, ran in < 1s
== COMPLETED: Installation ended successfully (took 1 secs)
== Results of the build can be found in the log file(s) /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/12.2.0/easybuild/easybuild-GCCcore-12.2.0-20241210.181247.log.bz2
== processing EasyBuild easyconfig /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/EasyBuild/4.9.4/easybuild/easyconfigs/p/pkgconf/pkgconf-1.9.3-GCCcore-12.2.0.eb
== building and installing pkgconf/1.9.3-GCCcore-12.2.0...
  >> installation prefix: /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/pkgconf/1.9.3-GCCcore-12.2.0
== fetching files [skipped]
== creating build dir, resetting environment...
  >> build dir: /tmp/casparl/easybuild/build/pkgconf/1.9.3/GCCcore-12.2.0
== Running post-ready hook...
== ... (took < 1 sec)
== unpacking [skipped]
== patching [skipped]
== preparing...
== Running pre-prepare hook...
== ... (took < 1 sec)
== FAILED: Installation ended unsuccessfully (build directory: /tmp/casparl/easybuild/build/pkgconf/1.9.3/GCCcore-12.2.0): build failed (first 300 chars): Module command '/usr/share/lmod/lmod/libexec/lmod python show GCCcore/12.2.0' failed with exit code 1; stderr: Lmod has detected
the following error: Unable to load module because of error when evaluating modulefile:
     /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/al (took 0 secs)
== Results of the build can be found in the log file(s) /scratch-local/casparl.8987353/eb-bscq9hvh/easybuild-pkgconf-1.9.3-20241210.181248.DZrFb.log
0:00:02  1 out of 37 easyconfigs done: GCCcore/12.2.0 (OK)
ERROR: Build of /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/EasyBuild/4.9.4/easybuild/easyconfigs/p/pkgconf/pkgconf-1.9.3-GCCcore-12.2.0.eb failed (err: "build failed (first 300 chars): Module command '/usr/share/lmod/lmod/libexec/lmod python show GCCcore/12.2.0' failed with exit code 1; stderr: Lmod has detected the following error: Unable to load module because of error when evaluating modulefile:\n     /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/al")

maybe we can make that pre-prepare hook do nothing. Or skip the prepare phase. NOt sure...

Copy link

eessi-bot bot commented Dec 10, 2024

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi-hpc.org-2023.06-software, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@riscv-eessi-io-bot
Copy link

Instance eessi-bot-riscv is configured to build for:

  • architectures: riscv64/generic
  • repositories: riscv.eessi.io-20240402

Copy link

eessi-bot bot commented Dec 10, 2024

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Outdated Show resolved Hide resolved
…e of the step-hooks, so we can unset it at the end
eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Show resolved Hide resolved
eb_hooks.py Show resolved Hide resolved
@casparvl
Copy link
Collaborator Author

casparvl commented Dec 11, 2024

Ok, this PR is more or less ready, but we should create a known issues page on the zen4 tree missing 2022b / GCCcore 12.2.0. I think it can be very simple and state that because of issues observed with the OpenBLAS from that toolchain generation, we decided not to support it. It also makes sense: zen4 was only release end of 2022, so the 2022b stack would have had very little support for it.

@casparvl casparvl marked this pull request as ready for review December 11, 2024 16:57
@casparvl casparvl marked this pull request as draft December 11, 2024 16:58
@casparvl
Copy link
Collaborator Author

Ok, I'll need EESSI/docs#357 to be merged first. Then, I'll put a link to that part of the docs in the LmodError.

eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Outdated Show resolved Hide resolved
eb_hooks.py Show resolved Hide resolved
eb_hooks.py Show resolved Hide resolved
@casparvl casparvl marked this pull request as ready for review January 7, 2025 09:33
eb_hooks.py Outdated Show resolved Hide resolved
@casparvl
Copy link
Collaborator Author

casparvl commented Jan 7, 2025

To test this PR:

$ export EESSI_SOFTWARE_SUBDIR_OVERRIDE=x86_64/amd/zen4
$ module load EESSI/2023.06
$ mkdir -p /tmp/my_install_prefix
$ export EESSI_USER_INSTALL=/tmp/my_install_prefix
$ module load EESSI-extend/2023.06-easybuild
$ # Proving this works when installing GCCcore 12.2.0 itself:
$ # Pass an existing source path with a proper config.guess - redownloading it seems broken today
$ eb GCCcore-12.2.0.eb --hooks software-layer/eb_hooks.py --sourcepath $HOME/.local/easybuild/sources
$ module av GCCcore/12.2.0
Lmod has detected the following error:  EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for the Zen4 architecture.
See https://www.eessi.io/docs/known_issues/eessi-2023.06/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture
While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    GCCcore/12.2.0   /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
$ # Trying something that only has GCCcore 12.2.0 as dependency:
$ eb libyaml-0.2.5-GCCcore-12.2.0.eb --hooks EESSI/software-layer/eb_hooks.py --sourcepath $HOME/.local/easybuild/sources
$ module load libyaml/0.2.5-GCCcore-12.2.0
Lmod has detected the following error:  EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for the Zen4 architecture.
See https://www.eessi.io/docs/known_issues/eessi-2023.06/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture
While processing the following module(s):
    Module fullname               Module Filename
    ---------------               ---------------
    GCCcore/12.2.0                /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
    libyaml/0.2.5-GCCcore-12.2.0  /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/libyaml/0.2.5-GCCcore-12.2.0.lua
$ # Trying something that has a lot of dependencies and thus uses --robot:
$ eb SciPy-bundle-2023.02-gfbf-2022b.eb --hooks EESSI/software-layer/eb_hooks.py --sourcepath $HOME/.local/easybuild/sources --robot
$ module load SciPy-bundle/2023.02-gfbf-2022b
Lmod has detected the following error:  EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for the Zen4 architecture.
See https://www.eessi.io/docs/known_issues/eessi-2023.06/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture
While processing the following module(s):
    Module fullname                  Module Filename
    ---------------                  ---------------
    GCCcore/12.2.0                   /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
    GCC/12.2.0                       /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCC/12.2.0.lua
    gfbf/2022b                       /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/gfbf/2022b.lua
    SciPy-bundle/2023.02-gfbf-2022b  /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/SciPy-bundle/2023.02-gfbf-2022b.lua
$ # Trying to load one of the dependencies installed for SciPy-bundle:
$ module load Python/3.10.8-GCCcore-12.2.0
Lmod has detected the following error:  EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for the Zen4 architecture.
See https://www.eessi.io/docs/known_issues/eessi-2023.06/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture
While processing the following module(s):
    Module fullname               Module Filename
    ---------------               ---------------
    GCCcore/12.2.0                /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
    Python/3.10.8-GCCcore-12.2.0  /tmp/caspar_eessi_test/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/Python/3.10.8-GCCcore-12.2.0.lua

I think all of these look as expected and intended. With that, I think this PR is ready for a final review (and hopefully merge :))

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ocaisa ocaisa merged commit a2c8ffb into EESSI:2023.06-software.eessi.io Jan 10, 2025
50 checks passed
Copy link

eessi-bot bot commented Jan 10, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.10

1 similar comment
Copy link

eessi-bot bot commented Jan 10, 2025

PR merged! Moved [] to /project/def-users/SHARED/trash_bin/EESSI/software-layer/2025.01.10

@bedroge
Copy link
Collaborator

bedroge commented Jan 10, 2025

@casparvl @ocaisa shouldn't this be actually "built" and ingested?

@ocaisa
Copy link
Member

ocaisa commented Jan 10, 2025

bot: build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1

Copy link

eessi-bot bot commented Jan 10, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1 from ocaisa
    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1

Copy link

eessi-bot bot commented Jan 10, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • received bot command build repo:eessi.io-2023.06-software arch:aarch64/neoverse_n1 from ocaisa

    • expanded format: build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1
  • handling command build repository:eessi.io-2023.06-software architecture:aarch64/neoverse_n1 resulted in:

    • no jobs were submitted

Copy link

eessi-bot bot commented Jan 10, 2025

error: eb_hooks.py: patch does not apply

Unable to download or merge changes between the source branch and the destination branch.
Tip: This can usually be resolved by syncing your branch and resolving any merge conflicts.

@boegel
Copy link
Contributor

boegel commented Jan 10, 2025

We can take care of the ingest via #608 which is now ready to deploy & merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants